Graph-based ETL Processes for Warehousing Statistical Open Data

نویسندگان

  • Alain Berro
  • Imen Megdiche
  • Olivier Teste
چکیده

Warehousing is a promising mean to cross and analyse Statistical Open Data (SOD). But extracting structures, integrating and defining multidimensional schema from several scattered and heterogeneous tables in the SOD are major problems challenging the traditional ETL (Extract-Transform-Load) processes. In this paper, we present a three step ETL processes which rely on RDF graphs to meet all these problems. In the first step, we automatically extract tables structures and values using a table anatomy ontology. This phase converts structurally heterogeneous tables into a unified RDF graph representation. The second step performs a holistic integration of several semantically heterogeneous RDF graphs. The optimal integration is performed through an Integer Linear Program (ILP). In the third step, system interacts with users to incrementally transform the integrated RDF graph into a multidimensional schema.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BPMN-Based Conceptual Modeling of ETL Processes

Business Intelligence (BI) solutions require the design and implementation of complex processes (denoted ETL) that extract, transform, and load data from the sources to a common repository. New applications, like for example, real-time data warehousing, require agile and flexible tools that allow BI users to take timely decisions based on extremely up-to-date data. This calls for new ETL tools ...

متن کامل

A Semantic Approach towards CWM-based ETL Processes

Nowadays, on the basis of a common standard for metadata representation and interchange mechanism in data warehouse environments, Common Warehouse Metamodel (CWM) – based ETL processes still has to face significant challenges in semantically and systematically integrating heterogeneous sources to data warehouse. In this context, we focus on proposing an ontology-based ETL framework for covering...

متن کامل

A Content-Driven ETL Processes for Open Data

The emergent statistical Open Data (OD) seems very promising to generate various analysis scenarios for decision-making systems. Nevertheless, OD has problematic characteristics such as semantic and structural heterogeneousness, lack of schemas, autonomy and dispersion. These characteristics shakes the traditional Extract-Transform-Load (ETL) processes since these latter generally deal with wel...

متن کامل

Open User Involvement in Data Cleaning for Data Warehouse Quality

High quality of data warehouse is a key to make smart strategic decisions. The data cleaning is program that performs to deal with the quality problems of data extracted from operational sources before their loading into data warehouse. As the data cleaning can introduce errors and some data require manually clean, there is a need for an open user involvement in data cleaning for data warehouse...

متن کامل

XML based Framework for ETL Processes For Relational Databases

In Data Warehousing, Extraction-Transformation-Loading (ETL) are the key tasks that are responsible for the extraction of data from several sources, their cleansing, customization and insertion into data warehouse [10]. More specifically ETL tools are category of specialized tools with the task of dealing with data warehouse cleaning and loading problems. These task are very critical in every d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015